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ABSTRACT 


In  the  literature  *>f  point  estimation,  Cauchy  distribution  with  location 
parameters  was  often  cited  as  an  example  for  the  failure  of  maximum  likelihood 
method  and  hence  the  failure  of  likelihood  principle  in  general.  Contrary  to 
the  above  notion,  we  proved,  even  in  this  case  that  the  likelihood  equation 
has  multiple  roots,  that  the  maximum  likelihood  estimator  (the  global  maximum) 
remains  as  an  asymptotically  optimal  estimator  in  the  Bahadur  sense. 


AMS  1970  Subject  Classification:  Primary  62F20  Secondary  62E10. 

Key  Words  and  Phrases;  Likelihood  function,  maximum  likelihood  estimator, 
likelihood  equation,  Cauchy  distribution,  consistent  estimator,  first-order 
efficient,  second-order  efficient. 


1. 


INTRODUCTION 


For  point  estimation  the  parameter  0,  the  likelihood  principle  (see 
Fisher  (1922,  1925))  yields  the  maximum  likelihood  estimator  (m.l.e.)  0. 
There  are  considerable  literatures  studying  various  properties  of  m.l.e. 
The  use  of  m.l.e.  has  a  long  history  and  may  go  back  to  Gauss  and 
Edgeworth.  For  general  review,  it  can  be  found  from  recent  articles  by 
Edwards  (1972)  and  Norton  (1972).  The  maximum  likelihood  method  is  a 
very  controversial  and  emotional  issue  throughout  the  history  of  statis¬ 
tical  point  estimation.  Recently  there  have  been  many  articles  still 
interested  in  this  issue,  to  name  a  few,  Berkson  (1953,  1980),  Efron 
(1975,  1982),  Kraft  and  LeCam  (1956),  Rao  (1980),  Ferguson  (1982),  Fu 
(1982),  and  Reeds  (1985). 

Let  Xi,...,Xn  be  n  independent  identically  distributed  (i.i.d.)  ob¬ 
servations  having  density  function  f(x|0),  where  0  is  a  fixed  value  in 
the  parameter  space  0.  Given  the  data  s  =  (xj , . . . ,xn) ,  the  likelihood 
function  of  0  is  defined  by 

n 

(1.1)  Ln(0 | s )  =  n  f(xi|0). 

i=l 

For  given  s,  the  maximum  likelihood  estimator  §n(s)  for  0  is  a  value  in 
the  parameter  space  0  which  maximizes  the  likelihood  function  (1.1):  i.e. 

n 

(1.2)  Ln($n(s) |s)  =  max  n  f(xi(0). 

0eO  i=l 

The  standard  method  to  obtain  the  maximum  likelihood  estimator  0n(s) 
is  to  find  the  root  (or  roots)  of  the  following  equation 

(1.3)  tn1)(0!s)  -  5e  log  Ln(0ls)  -  0. 

The  equation  (1.3)  will  be  referred  to  as  the  likelihood  equation. 


Cramer  (1946,  p.500)  proved  that,  under  certain  regularity  condi¬ 

tions,  there  exists  a  sequence  of  roots  8n(s)  of  likelihood  equation 

(1.3)  which  converges  in  probability  to  9  as  n  tends  to  infinite.  Since 

then  the  consistency  (or  strong  consistency)  of  maximum  likelihood 
estimator  has  been  studied  by  many  researchers,  for  example,  Wald  (1949), 
Wolfowitz  (1953,  1965)  and  LeCam  (1953,  1970).  Under  certain  regularity 

conditions,  the  maximum  likelihood  estimator  0n(s)  is  also  asymptoti¬ 
cally  normally  distributed  with  asymptotic  variance  v(9)  achieving  the 
Cramer-Rao  lower  bound;  i.e.,  for  any  estimator  Tn(s) 

(1.4)  /n(Tn(s)-9)  ■*  N(0,v(9)),  as  n  -*■  “ 
then 

(1.5)  v(0)  >  1/1(9) 

where  1(9)  is  Fisher  information  and  the  equality  holds  when  Tn(s)  is 
maximum  likelihood  estimator.  Hence  the  m.l.e.  is  an  asymptotically 
efficient  estimator. 

An  example  which  was  mostly  cited  in  the  literature  for  the  failure 
of  likelihood  principle  is  when  observations  were  sampled  from  a  Cauchy 
distribution  with  a  location  parameter  9.  For  given  s  =  (x^,...,xn)  it 
has  likelihood  function 

n  1 

(1.6)  Ln( 9  |s )  =  n - 

i=l  tt(  1-Kx±- 9)2) 

and  likelihood  equation 

n  2(9-x^) 

s)  =  -  I  -  =  0. 

i“l  l+(xi-9)2 

The  major  reasons  that  the  Cauchy  distribution  is  often  cited  as  an 
example  for  the  failure  of  maximum  likelihood  method  of  estimation,  hence 


(1.7)  l(nl)  (  9  | 


the  failure  of  likelihood  principle  in  general  (for  example,  Berkson 
(1980),  Ferguson  (1978)  and  Reeds  (1985)),  are  as  follows: 


(a)  The  likelihood  equation  (1.7)  associates  with  a  polynominal 
with  (2n-l)  degrees.  Hence  it  has  (2n-l)  roots  (real  and  complex).  The 
number  of  roots  increases  as  the  sample  size  increases. 

(b)  Neither  analytical  nor  numerical  solutions  of  the  likelihood 
equation  (1.7)  can  be  obtained  easily  when  sample  size  is  moderately 
large. 

(c)  All  the  real  roots,  but  one  (the  global  maximum:  m.l.e.),  of 
the  likelihood  equation  tend  to  +°°  or  in  probability  as  n-*»  (see  Reeds 
(1985)). 

(d)  The  asymptotic  efficiency  of  maximum  likelihood  estimator  (the 
global  maximum)  still  remains  unknown. 

The  main  purpose  of  this  paper  is  to  show  that  the  maximum  likeli¬ 
hood  estimator  8n(s)  (the  global  maximum)  converges  to  8  exponentially 
and  is  an  asymptotically  efficient  estimator  in  the  Bahadur  sense. 

2.  Main  Results 

Fu  (1971,  1973)  proved  that  for  any  consistent  estimator  Tn(s)  and 

e  >  0 

(2.1)  ^  log  P(  |Tn(s)-  Q  |  2e)  >  -B(e,e) 

and 


(2.2) 


lo8  P(|Tn(s)-9|^)  £  -I(9)/2 

ne^ 

where  1(8)  is  Fisher  information  and  B(8,e)  is  Bahadur  bound  defined  by 


(2.3)  B( 8, e)  =  inf{K(9',e):  | Q ’ ~8  j  >e> 

8' 

and  the  Kullback-Liebler  information  K(0',0)  is  given  by 


(2.4)  K(8',8)  =  |  (log  ~ f(x|8')dx 


f(x|e) 


The  inequalities  (2.1)  and  (2.2)  provide  an  important  conclusion  in 
large  sample  theory  of  estimation  that  for  any  consistent  estimator  Tn(s) 
the  e-tail  probability 


(2.5)  an(Tn,8 ,e)  =  P(|Tn(s)-0|*e) 

cannot  tend  to  zero  faster  than  the  rate  exp{-n[B(0,e  )+o(l)]}  (for  fixed 
e)  or  the  rate  exp{-nl( 8)e^/2)  (for  e  near  the  zero)  tends  to  zero.  The 
estimator  achieving  the  bound  (2.2)  is  referred  to  as  first-order 
efficient  estimator  in  the  Bahadur  sense. 

For  given  s,  let  0n(s)  and  ^(s)  be  the  largest  and  the  smallest 
real  roots  of  the  likelihood  equation  (1.3)  respectively  and  the  maximum 
likelihood  estimator  8n(s)  be  a  root  of  (1.3)  which  maximizes  the 
likelihood  function  (1.7)  (i.e.  §n(s)  is  a  global  maximum).  It  follows 

(2.6)  8jj( s )  S  8n(s)  <  8n(s) 


and  for  any  e  >  0,  the  following  inequalities  hold 

P(8J1(s)>8 +e)  <  P(t^1)(s|e+E)>0)  s  P(8n(s)>e+e), 


(2.7) 
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(2.8)  P(0n(s)<0-E)  <  P(  T  (s  |9-e)<0)  <  P(9n(s)S6-e). 


If  the  likelihood  equation  (1.1)  has  a  unique  root  for  every  s  then 


®n(s)  =  $n(s)  =  ®n(s)-  Hence  inequalities  (2.7)  and  (2.8)  yield  the 


following  inequalities 


(2.9)  P(§n£  8+e)  =  P( ^  (s j 0+e)£O)  , 


(2.10)  P(Sn(s)£8-e)  =  P(ri;(sj8-E)<0). 

n  1 


Under  this  assumption  of  unique  root  of  likelihood  equation  (hence 


unique  m.l.e.  f)n(s)),  Fu  (1973)  proved 


(2.11)  lim  lim  log  P( | 8n(s)-8 | £e)  =  -I(8)/2. 
e->-0  n-*>°  ne^ 


Hence  the  m.l.e.  §n(s)  is  an  asymptotically  efficient  estimator  in  the 


Bahadur  sense. 


For  the  Cauchy  distribution,  it  is  clear  that  the  condition  of 


unique  root  of  likelihood  equation  is  violated.  Whether,  in  this 


case,  the  m.l.e.  ®n(s)  remains  as  an  asymptotically  efficient  estimator  was 


listed  as  a  conjecture  in  the  papers  of  Fu  (1982)  and  Rubin  and  Rukin 


(1983).  The  following  theorem  gave  a  positive  answer  to  the  conjecture. 


Theorem:  For  e  >  0  sufficiently  small,  we  have 


(2.12)  lim  -  log  P( |§n(s)-0 j>E)  =  -£(0,9,e) 


where 


(2.13)  8(0, 0,e)  =  e2(1+0(/e))/A. 


and  0(/c )  stands  for  0(v'e)/v'e  -»  constant  as  c  -»  0. 
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The  mathematical  proof  of  this  theorem  is  extremely  hard  and  tedious.  We 
leave  the  proof  in  the  next  section. 

One  may  note  that  the  above  exponential  rate  0(0,8,e)  is  independent 
of  8.  This  is  due  to  the  fact  that  8  is  a  location  parameter.  Similarly 
the  Fisher  information  1(8)  of  the  Cauchy  distribution  is 

(2.14)  1(8)  =  E  (2(9"X) — )2  =  \ 

l+(X-0)2  L 

which  is  also  independent  of  8.  Results  (2.13)  and  (2.14)  together  give 

(2.15)  lim  e(0,0,e)/e2  =  I(0)/2  = 
e-0 

Hence  the  maximum  likelihood  estimator  8n  for  location  parameter  of 
Cauchy  distributions  is  an  asymptotically  efficient  estimator  in  the 
Bahadur  sense. 

The  properties  (a)  and  (b)  of  likelihood  equation  (1.7)  of  Cauchy 
distribution  couldn't  be  altered.  In  view  of  Reeds'  and  our  results,  the 
m.l.e.  0n  converges  to  0  exponentially  with  the  optimal  rate.  One  could 
not  consider  the  Cauchy  distribution  as  a  major  example  for  the  failure 
of  maximum  likelihood  method  of  estimation. 

3 .  Proof  of  Main  Theorem 

Let  X]_,  X2*.-.  be  a  sequence  of  i.i.d.  bounded  random  variables  with 

n 

mean  zero,  Var(X^)  =  o2  >  0,  and  |X^j<M.  Write  Sn  =  E  X^.  To  prove  our 

i=l 

theorem  we  need  the  following  lemmas. 


V 

) 


Lemma  L  For  any  e>0,  when  n  is  sufficiently  large  then 


(3.1)  P(|Snj>£n)  <  2  exp{-S-H(l  -  —  )}. 

2 2a^ 


Lemma  2.  For  any  0  <  e  £  when  n  is  sufficiently  large  then 

M 


(3.2)  P(  |  Sn  j  Sen)  >  2  exp{-  —  (1  +  5  )}. 

2  o2  /  o2 


The  above  lemmas  can  be  proved  by  the  same  methods  used  for  Lemma  1  and 


Lemma  2  in  Chapter  X,  Petrov  (1975).  We  omit  the  proofs. 


Proof  of  Main  Theorem: 


Without  loss  of  generality,  we  can  assume  that  the  true  value  of  0 


equals  zero  and  that  {Xn},  n  =  1,2,...  will  represent  a  sequence  of 


i.i.d.  standard  Cauchy  random  variables  with  common  density  function 


(3.3)  f(x)  = 


tt(1+x2) 


for  all  xe(-00,00). 


The  E  will  stand  for  expectation  with  respect  to  the  standard  Cauchy 


distribution.  Let  0  be  the  variable  of  following  log-likelihood  function 


(3. A)  Sn(9)  =  l  log  (i+Ui-e)2). 

i=l 


It  follows 


(3.5)  £  E  Sn(8 )  =  log  ( 9 2+4 ) . 


(3.6)  6  =  J  ( log( A  +  £) 


log  A )=  J  log  (1  +  ^)  a  0.0152, 


and  define 


(3.7)  An(8)  =  (Sn(8)  <  n( log  A  +  26)}, 


V.  “w  I,  -  •  •  m  *  m~m  »  *  c  ■*  *  ‘  »  *  •  "  » 

•^V  •  •  -  •  \V  • 


then  we  have,  for  any  te (0,1/4), 


(3.8)  P( An( 8 )  =  P{  I  log  — —  —  4 -  n(log  -  26)} 


i=l  ^(Xi-S)2 


4+02 


*  fexp{-nt(log  — —  -  26)}]E{exp[t  log 


02+4 


l+(Xr  )2 


-J)n 


=  [exp{-nt(log^ - 2 6 ) }  ] { 1+  I  E(log  -  -9-2+4 - r 

4  k=2k  iHXiV2 

0 

If  0  >  0  and  X^  <  —  then 


(3.9)  log  — 7  S  log  =  log  4. 


l+Ui-Q)2  1+02/4 


If  0  >  0  and  X^  >  —  then 


(3.10)  P(XX  >  |)  S  ^|  <  |  and 


(3.11)  log 


02+4 


^(Xi-e)2 


2  log  (02+4). 


Thus  (3.9),  (3.10)  and  (3.11)  yields 


(3 


.12)  E(log  -fo4  — )k  <  (log4)k  +  |  (log(  02+4)  )k 


l+(X!-0)2) 


Inequalities  (3.8)  and  (3.12)  imply  that  for  8  > 


82+4 


(3.13)  P(An(0))  £  exp{-n[t(log  -  26)  -  At2]}, 


where 

(3.14)  A  =  [  log24  +  j  sup  (4(  02+4)'*'//4  log2( 02+z, ) )  ]  , 


$1 

8 


i 


r 

% 


Taking  t  =  6/A  and  inserting  it  into  the  inequality  (3.13),  we  get,  for 


e>|. 


(3.15)  P(An(6))  £  exp{-n6A  1(log  9  -  36)}. 


Similarly,  the  inequality  (3.15)  also  holds  for  0  <  -  ^. 

Let  9k  =  i  +  k6,  k  =  0,1,2 .  It  follows  from  (3.15)  that  we  have 


for  n  sufficiently  large, 


(3.16)  E  P(An(0k))  -  2  exp{-n6A_1(log 


4+0  2 
k 

4  +1/4 


+  6)> 


£  2  exp{-n62A“^} . 


Note  that 


i  ,  v  ,  i  n  2(9-Xi) 

(3.17)  |iSn(D(e)|-  E  - -I  *  i- 

i=l  l+(0-Xi)2 

If  9e(0k,0k+1)  and  ^  Sn(0)  <  log  4  +  6  then 

(3.18)  A  sn(ek)  =  A  sn(e)  +  (A  sn('k)  -  A  s„(e)) 


-  A  Sn(0)  -  ce-ek)(i  s^U*)) 


£  log  4  +  26 . 


Hence 


(3.19)  P(Sup  -  Sn(0 )  £  log  4  +  6) 


00  1 

S  P(  U  A(J  Sn(0k)  <  log  4  +  26)) 


£  E  P(  n(9k>)  S  2  exp{-n62/A} 
k=0 
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By  the  same  token  we  have 


(3.20)  P(sup  ^  Sn(9)  £  log  4  +  6)  £  2  exp{-n62/A). 


0<"i 


Inequalities  (3.19)  and  (3.20)  imply  that 

(3.21)  P(  sup  ^  Sn(9)  £  log  4+6)  £  4  exp{-n62/A}. 

|e|>| 


On  the  other  hand,  we  have  for  t£(Q,l/4) 

(3.22)  P(i  Sn(0)  S  log  4  +  6)  =  P(i  z  iog 


1+Xj2 


n  i=l  '  “  4 


2  6  ) 


Since 


E  exp{t  log  — 7 — }  < 


[exp{-n5t} ](E  exp{t  log 


1+Xj/ 


l+*l* 


-)n. 


for  all  t  < 


and 


E  log 


1+X]/ 


thus  there  exists  a  constant  A  >  0  such  that 

1+xl2  t2  t2 

(3.23)  E  exp{t  log  — ^ — }  S  1  +  ^i  <  exppjrA} 

for  all  sufficiently  small  positive  t.  Therefore 

(3.24)  P(^  Sn(0)  >  log  4  +  6)  S  exp{-n6t  +0^4} . 
Taking  t  =  6 /A,  we  have 

(3.25)  P(J  Sn(0)  2;  log  4  +  6)  <  exp{-n62/2A}. 

Now  we  consider  the  case  when  0e(-  ^,^).  Note  that 


>H|fN 
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m  n  l-(6-Xi2) 

(3.26)  SU;(0)  =  2  I  - 

n  i=i  (l+Ce-Xi)2)2 

and 


(3.27)  E  -  S(2*(8)  =  -8"2--2- -  . 

n  n  (02+4)2 

Hoeffding  (1965)  proved  that  if  Xi,..,Xn  are  independent  and 

ai  “  Xi  -  then,  for  t  >  0  the  following  inequality  holds 

n  n  n 

P(  £  XA  -  Z  EXi  2  nt)  S  exp{-2n2t2/  Z  (b^ai)}. 
i=l  i=l  i=l 


Since 

,  2(l-(Xi-0)2) 

(3.28)  -  7  S  -  S  2 

4  (l+tti-e)2)2 


it  follows  from  Hoeffding 's  inequality  that  we  have 


P(£  S<2)(0)  s  h 
n  n  o 


n 

P(f  I  [ 


1  “  f  8-202 


2(l-(Xi-0)2)  g„202 


]  * 


(3.29) 


n  i=l  (02+4)2  (l+(Xi-0)2)2  (02+4)2 


8-782  2(l-(Xi“0)2) 


S  P(-  l  [ 

n  i=l  (62+4)2  (1+(Xi_0)2)2 

5  exp{-2n(y|)2]. 


] 


for  all  I  ©  I <1 . 


Again,  since 


(3.30)  £s<3)(0) 

n  n 


4  n  (0-Xi)(3-(0-Xi)2 


=  1"  2 

■  « 


n  i=l  (l+(Xi-0)2)3 


S  6, 


I* 


ft 

m 

'i'M 


Q.,2k  2(k+l)x  ,  1  A2),„s  „  1 

if  6e(— ,— — - - )  and  —  S  (8)  £  — ,  then 

/  j  /  j  n  n  25 

±  s<2>(||)  -  i  S<2><0)  +  (i  S<2»(f|)  -  i  S<2)<0» 


(3.31) 


=  i  s£2)(e)  +  (y|  -  0)  £  S^3)Uk) 


.1,2,  1 

<  25  +  75  6  "  5 


for  k  =  -36,  -35, ... ,35,36.  Hence 


36 

(3.32)  P(  inf  J  S^2)(8)  S  )  <  I  P(J  Sn(2)(||)  £  h 

n  45  .  n  11  75  5 


leU  | 


k=-36 


S  75  exp{-2n(^)2}. 


Now  we  define  the  following  events 


Ani  =  {  sup  ^  Sn(e)  <,  log  4  +  6), 

|e|*f 

An2  =  (“  Sn(0)  Z  log  4+6),  and 


*n3  -  ( 


Vil 

••'I? 

9 

‘/,'A 


Note  that  the  m.l.e.  8n(s)  minimizes  Sn(8)  and  satisfies  the  likelihood 
C  X )  c  c 

equation  (8)  =  0.  If  A^  and  A^  occur  simultaneously,  then 
Sn€(-  j,^).  ^  A^3  occurs  then  S^^(8)  is  strictly  increasing  on  the 

interval  (-  ^  ).  If  A^  ,  A^0 ,  and  simultaneously  occur  then,  for 


nl '  n2 


ec(0,-^),  8n  >  e  if  and  only  if  S^^(e)  <  0.  Hence 


7'? 

*V»! 


5 
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l 


rrawwwwFWTiT  u»  n* 


(3.34) 

P(6n(. 

s)  i  E, 

Anl* 

4- 

and 

4> 

=  P(S^1}( e) 

<  o. 

4- 

4- 

and  Ac  ) 
n  J 

Note  that 

(3.35) 

2(  E- 

E - 

-Xi) 

2e 

9 

1+(e 

-X!)2  ' 

e2+4 

by  Lemmas  1  and  2,  we  have  for  £  >  0  sufficiently  small 

( ■]  \  n  o  _  2(e~Xj)  - 

(3.36)  P(S;i;(e)<0)  =  P(  I  (-^ - )  >  n-^-) 

i=l  e2+4  l+(e-Xi)^  e2+4 

-  «xp(-  ?  (-^-)2  (<£|>i+0<^»> 

z  e2+4  0 

=  exp{-  ^  e2(l  +  0(/e))}. 

From  the  above  results  (3.20),  (3.25),  (3.32),  (3.34)  and  (3.36),  we  have 

(3.37)  P(§n(s)>e)  =  exp{-  ^  e2(l  +  O(/0))}, 
for  e  >  0  sufficiently  small. 

Similarly  we  have 

(3.38)  P(§n(s)<-E)  =  P(S(1)(-e)>0) 

n 

°  exp{-  |  e2(l  +  0(/7))}, 
for  e  >  0  sufficiently  small. 


Since  Cauchy  distribution  is  continuous  and  satisfies  Bahadur's 


condition  (Bahadur  1971  p.9)  hence  equations  (3.37)  and  (3.38)  yield 
P(|0n(s)j>£)  =  exp{-  J  E2(l  +  0(A))} 
for  e  >  0  sufficiently  small.  This  completes  the  proof. 
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